Deep learning: Basics

Background information

note  The following information only applies to tracking of rodents when using the Deep learning function. See also Adjust the settings for nose-tail base detection (Deep learning)

Deep learning is a type of machine learning. While machine learning is a general category that encompasses all sorts of mathematical tools that help a computer learn by experience, Deep learning refers to the use of deep neural networks, where “deep” indicates that the networks are made of multiple, hidden layers of neurons or decision nodes.

In a deep neural network, intermediate layers are placed between the input layer, which receives the data, for example the RGB values of the pixels that make up a bitmap picture, and the output layer, which represents the categories of classification; for example, when classifying a picture, “the picture is of a cat” or “the picture is not of a cat”.

Deep learning in EthoVision XT

A deep neural network can find structures in unstructured data, like for example pictures and video images. They can recognize recurring patterns, such as the eyes of humans in a number of portraits, and relationships between those patterns. Because of the layered structure of decision nodes, deep neural networks can learn to represent data at various levels, from low levels such as edges, colors, curves, etc. to higher levels such as semicircles (a combination of a curve and a straight edge), squares (a combination of straight edges), up to even higher, more abstract levels (concepts), such as “handwriting”, “dark object”, “a tail” etc.

EthoVision XT uses a trained network, that is, the network has learned to extract features from a number of video images of rodents of various colors and in various backgrounds, where the nose and the tail-base were previously annotated. During tracking, the network analyzes a portion of the image that includes the detected subject. It creates a map of probability of occurrence for both the nose- and the tail-base points, and finally makes an estimate of the position of the nose point and the tail-base point, based on the highest probability.

With deep learning, the detection of the body points is less dependent on the detected contour of the subject. You can see this effect in cases with low contrast between the subject and the background, like in the following picture. Here, a dark mouse is only partly detected when it rears with the forepaws placed on the cage wall. However, the neural network can find the nose point.

inset_3700080.jpg 

Convolutional networks

EthoVision XT uses a deep Convolutional Neural Network (CNN) to find the nose- and the tail-base points in each sampled video image. CNNs are particularly suitable to classify images based on spatial relationships. CNNs resemble the structures of the cells in the visual cortex of our brain. The visual cortex has small regions of cells that are sensitive to specific regions of the visual field. Hubel and Wiesel (Journal of Physiology 165: 559-568,1963) showed that some neurons fired only in the presence of edges of a certain orientation. Some neurons fired when exposed to vertical edges and some when shown horizontal or diagonal edges. The basis of convolutional networks is the idea that specialized components in the network have specific tasks, that is, to look for specific characteristics in the image.

Features detected with Deep learning

When you track the subjects with Deep learning, the main features of the subjects, for example the body center point, are calculated in different ways, either predicted by the neural network or the calculated based on the contour of the detected subject. When tracking multiple subjects with Deep learning, some of those features are not calculated at all. See the tables below for reference.

One subject per arena

Feature calculated

Technique and other features used

Center-point

Contour

Nose-point

Deep learning

Tail-base point

Deep learning

Body elongation

Contour

Body area

Contour

Body changed area

Contour

Head direction line

Contour, nose-point

Where:

Body elongation is used to calculate the dependent variable Body elongation and Body elongation state.

Body area and Body changed area are used to calculate Mobility and Mobility state.

The Head direction line is used to calculate Head direction and Head directed to zone.

Two subjects per arena

Feature calculated

Technique and other features used

Center-point

Deep learning

Nose-point

Deep learning

Tail-base point

Deep learning

Body elongation

Not calculated

Body area

Not calculated

Body changed area

Not calculated

Head direction line

Not calculated

 

See also

Deep learning: Requirements